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MongoDB TeanVs View | Problems 



• Development speed: break 40 year relational paradigm; 
Scale: Adapted to new hardware - parallelism, multicore, cloud, etc; 
• Scalability issues, adapt to "Big Data"; 
• Complex data: object to object, polymorphism; 



MongoDB TeanVs View | Scaling alternatives 



Vertically 


Horizontally 


small box -> bigger box 


small box -> more boxes 


One problem orfailure means 
whole system down 


Failures have to be solved and 
the system keeps working, but 
may not be safe 



MongoDB TeanVs View | Guiding concepts 



"Como propor um novo modelo de dados útil e funcional 
diante dos paradigmas de programação modernos?" 

chave/valor? Muito limitada, 
xml? Não é intuitivo nem ágil o suficiente, 
document-oriented, não-relacional? 

I 

JSON - Java Script Object Notation 

• Padronização RFC, garante portabilidade 

MongoDB provê um console para gerenciamento e acesso direto aos dados, 

baseado em javascript (mongo). 

A linguagem de queries do MongoDB é o próprio JSON. 



JSON Datatypes: 

1. null 

2. number 

3. string 

4. boolean 

5. objects/documents 

6. arrays 



Basic structure of the MongoDB 



Relacional 


MongoDB 


Database 


Database 


Table 


Collection 


Row 


Document (Object) 



Dot Notation || Mongo Query Language 

db. collection. method( { paramaters(JSON) } ); 



CRUD | A sample document! 



A seguinte query: 



db.mycollection.insert( { " _id" : "Q33", "x":3, "y:"abc", "z": "[1 ,2]" } ); 



Produzirá o seguinte resultado na base de dados: 

{ 

_id: "Q33", 
x: 3, 
y: "abe", 
z: [1,2] 

} 

Para obter este objeto, seria necessária apenas a seguinte query: 



db.mycollection.find( { " _id" : "Q33" } ); 



CRUD | A sample document! 



Uma maneira possível de representar este objeto em um banco de dados relacional: 

Tabela T: 



p 


X 


y 


Q33 


3 


"abe" 



Tabela T Z: 



p 


z 


Q33 


1 


Q33 


2 



Para obter este objeto, seria necessária a seguinte query: 

select * from T, T_Z where T.P = T_Z.P and P= "Q33"; 

ou 

select * from T join T_Z on (T.P = T_Z.P) where P="Q33"; 



CRUD | Creating a document 



db.users.insert ( collection 

{ 

name: "sue", m field: value 

age : 26, - field: value V document 

status: "A" + field: value 

> 

) 



Analogous SQL query: 



INSERT INTO users m table 

( name, age, status ) < columns 

VALUES ( "sue", 26, "A" ) ^ values/row 



CRUD | Updating a document 



db, use rs. update ( 

{ age: { $gt: 18 } }, 
{ $set : { status: "A" 
{ multi: true } 



* collection 

— update critería 

} } , update action 

^ update optíon 



Analogous SQL query: 



UPDATE users -# 
SET status = 'A' 
WHERE age > 18 * 



table 

update action 
update críteria 



CRUD | Updates & moves 



Update operations can increase the size of the document. If a document outgrows its 
current allocated record space , MongoDB must allocate a new space and move the 
document to this new location. 

To reduce the number of moves, MongoDB includes a small amount of extra space, or 
paddinq , when allocating the record space. This padding reduces the likelihood that a 
slight increase in document size will cause the document to exceed its allocated record 
size. 

"Records and documents are almost the same thing, 
but records have some more space in the end, 
since a record actua lly conta ins the document!" 

Record: 

[ [document] + [free space (padding)] ] 

Obs.: Mongo has a 16MB per document limit. In order to store large files, mongo 

has "gridFS". 



CRUD | Updates 



The instructions: 

t = db.mycollection; 
t.insert({_id : 2, "z" :17}); 
t.update ( {_id:2}, { $push: {"array":14} } ) 
t.update ( {_id:2}, { $push: {"arrayrfifteen"} } ) 
t.update ( {_id:2}, { $push: {"array":"fifteen"} } ) 
t.update ( {_id:2}, { $addToSet: { ,, array ,, : ,, 16 n } } ) 
t.update ( {_id:2}, { $addToSet: { ,, array ,, : ,, 16"} } ) 
t.update ( {_id:2}, { $addToSet: {"array":"^"} } ) 

Will result in: 

{ "_id" : 2, "array" : [ 14, "fifteen", "fifteen", "16" ], "z" : 17 } 



Refere nces 



References offer a "normalized" structure for mongo documents. 



user document 



{ 



} 



_id: <0bjectld1>, 
username: "1 23xyz M 



contact document 




_id: <0bjectld2>, 
user_id : <0bjectld1 > f 
phone: "123-456-7890", 
etnail : "xyz@example. com" 



access document 



{ 



_id: <0bjeetld3> f 
user_id: <0bjectld1>, 
levei: 5, 
group: "dev" 



> 



Embedded data 



Embedded documents capture relationships between data by storing related data in a 

single document structure. 



{ 



_id: <0bjectld1>, 
username: "123xyz" , 
contact: { 

phone: "123-456-7890", 
email : "xyz@example . com" 




access: { 



levei: 5, 
group: "dev" 



> 



Embedded sub- 
document 




Embedded sub- 
document 



Indexing 



MongoDB indexes use a B-tree data structure. In other words, indexes in Mongo DB are a B- 
tree key (doe, location). One might define a key in either ascending or descending order, and it 
will serve both ways, since ali the mongo algorithm has to do is choose whether to read the tree 
from right to left, or left to right. 

• _id is implicitly called, ali other indexes must be explicitly declared; 

• Can index array contents; 

• Can index subdocuments and subfields; 

• Indexes may be of any kind (string, integer, etc). 

• Allows multi-part indexes; 



Types of cursors used by the query operations in indexing: 

• BasicCursor indicates a full collection scan. 

• BtreeCursor indicates that the query used an index. The cursor includes name of the 
index. When a query uses an index, the output of explain() includesindexBounds details. 

• GeoSearchCursor indicates that the query used a geospatial index. 

How to create an index: 



db. collection. ensurelndex( { paramaters } ); 



Aggregation framework 



Aggregation is a multi-stage pipeline that transforms the documents into an aggregated 
result, resembling somehow a join in the relational world. In a parallell with the pipe concept, 
it would feel like the following: 



aggregate X | match | project | group | 



Collection 
* 

db . orders . aggregate( 

Smatch phase- -{ Smatch: { status: "A" } }, 

Sgroup phase »►{ Sgroup : { _íd: "$cust_id" , total : { $sum: "Samount" } } } 

) 



{ 


çust_id 


"Al 23", 




amount : 


588, 




status : 


"A" 


} 






{ 


cust_id 


"Al 23", 




amount : 


258, 




status : 


"A" 


} 






c 


cust_Ld 


"B212", 




arno-unt : 


288, 




status ; 


"A" 


} 






{ 


çust_id 
anwunt : 


"Al 23", 
388, 




status : 


"D" 


} 







Smatch 



cust_id: "Al 23™ 
amount: 588, 
status: "A" 



cust_id: "A123", 

amount: 258, 
status; "A" 



CUSt.id: "B212", 

amount: 2B&, 
status: "A" 



Sgroup 



Results 



{ 




_id: "Al 23", 




total: 758 


} 




{ 




_id: "B212", 




total: 288 


} 





orders 



Map/reduce 



Map/Reduce is more powerful than aggregation operations. 
One can also use the hadoop conector already 
avaliable to run map/reduce operations in mongo. 

Collection 
\ 

db . orders . mapReduce( 

map — — *■ functionQ { emit( this. cust_id, th is, amount ); }, 

reduce — function(key , values) { return Arráy.sum( values ) }, 
{ 

query query: { status: "A" }, 

output *■ out: "order_totals" 

} 



{ 








cuiít id 


"A123", 




amount : 






status : 


"A" 


} 






{ 








cust_id 


"Al 23", 




amount; 


250, 




status: 


"A" 


} 






{ 








cust_id 


"B212", 




amount : 


288, 




status : 


"A" 


} 






t 








cust_id 


"A123", 




amount : 


388, 




status: 


"D" 


} 







query 



{ 






cust_íd: 


"Al 23", 






500, 




status: 


"A" 


} 






! 






cust_id 


"Al 23", 




amount: 


2 50 , 




status: 


"A" 


} 






{ 








CU5t_ld 


"B212", 




amount : 


20», 




status: 


"A" 


} 







{ "A123": [ 580, 258 ] } 



reduce 



map 



{ "B212". 200} 



{ 


_id: "Al 23", 




value; 750 


} 




{ 




.id: "B212", 




value: 288 


} 





order_totals 



orders 



Key features 



Duas funcionalidades básicas do MongoDB: 
Sharding & ReplicaSets 

Spliting collections & Duplicating data 

In different sets, you will nave same documents. 
In different shards, you will have different documents. 

ReplicaSet: 

1. Redundant copies of data; 

2. Replicas, copies, backups; 

3. Data safety (ds); 

4. High avaliability (ha); 

5. Disaster recovery (dr). 

Shards: 

1. Unique data; 

2. Scalability; 

3. Data partition. 



ReplicaSets 



A ReplicaSet (rs) in MongoDB is a group of monqod processes that provide redundancy and 
high availability. The members of a replica set are: 

Primarv - receives ali write operations 

Secondarv(ies) - replicate operations from the primary to maintain an identical data set. 
Secondaries may have additional configurations for special usage profiles. 
For example, secondaries may be non-votinq , have read prefence in reads, or be arbiters . 



ClientApplication 
Driver 



Writes Reads 

I I 



Primary 




Secondary 



Secondary 



Showing how simple MongoDB is at the console! 



▼ Terminal 

Arquivo Editar Ver Pesquisar Terminal Ajuda 



- + 



pedrocapedr o-notebook - s mon 

mongo mongoexport mongoperf mongostat montage . im6 

mongoconsole mongofiles mongorestore mongotop 
nongod mongoimport nongos mono 

mongodump nongooplog mongosnif f montage 

pedrogpedro-notebook - S mongod --dbpath /opt/mongo/data| 




Showing how simple MongoDB is at the console! 



" Terminal 
Arquivo Editar Ver Pesquisar Terminal Ajuda 


- + 


pedro@pedro-notebook - s mongo localhost 




MongoDB shell version: 2.4.6 




connecting to: localhost 




> db.statst) ; 




{ 




"db" : "localhost", 




"collections™ : B, 




"objects" : B, 




"avgObjSize" : B, 




"dataSize" : 0, 




"storageSize" : S, 




"numExtents" : E, 




"indexes" : B, 




"indexSize" : B, 




"fileSize" : B, 




"nsSizeMB" : B , 




"dataFileVersion" : { 




>, 




"ok" : 1 




> n 




> D 





Arbiter 



An arbiter does not have a copy of data set and cannot become a primary. Replica sets may 
have arbiters to add a vote in elections of for primary . Arbiters allow replica sets to have an 
uneven number of members, without the overhead of a member that replicates data. 




Secondary 



Heartbeat 



Arbiter 



• Replicas have a priority levei. Initially, ali replicas have priority set as 1 (except arbiters). 

• A member may have its priority set to 0, never being capable of being elected primary. 

• Member with highest priority becomes primary. 

• One member may have more than one vote. 



Elections 



Elections are essential for independent operation of a replica set. However, elections take time 
to complete. While an election is in process, the replica set has no primary and cannot accept 
writes. MongoDB avoids elections unless necessary. 



Prv^ry 



Secondary 



Election for New Primary 

. Heartbeat _ 



Secondary 



New Primary Elected 



Primary 



Regliçatiorw 
. Heartbeat fc 



Secondary 



Replication and Optime 



Default replication is done asynchronously in MongoDB, 
in concern for efficiency matters regarding distance issues in latency. Usually, the scenario is as follows: 

Client — writes — ►Master/Primary — >replicates — ►Slave/Secondary 

4 

Client <— Acknowledges 



Optime is a tuple used by mongod to register the write operations in the database, with two 32 bit fields: 

<time>, <ordinal> 
So, for instance, you might have something like: 

8nov2013 09:43:23 AM, 0 
8nov2013 09:43:23 AM, 1 
8nov2013 09:43:23 AM, 2 

8nov2013 09:43:24 AM, 0 
8nov2013 09:43:24 AM, 1 



Obs.: since each field is a 32 bit value, mongoDB can not perform more than 4 billion operations per second! 

So, knowing each server has its own optimes register, it is possible to know how big is the lag between mongod 

instances, both in terms of time and number of operations. 



Sharding 



Sharding divides the data set and distributes the data over multiple servers, or shards. 
Each shard is an independent database, and collectively, the shards make up a single 
logical database. 



db.users 




Shard A I Shard B I Shard C I Shard D 




id: 1, 



name: "mario", 



likes: ["ski", "soccer", "swimming", 
"judo"], 
age: 19 

} 
{ 

_id: 2, 

name: sonia, 

likes: [basketball, tennis, dance] 
age: 24 



} 



_id: 143565, 
name: albert, 
likes: [malakamb], 
age: 57 



Sharding key 



To shard a collection, you need to select a shard key. It is an indexed field or an 
indexed compound field. The shard key values are divided into chunks and 
distributed evenly across the shards. To divide the shard key values into chunks, 
MongoDB uses either range based partitioning (similar to Google's BigTable concept) or 
hash based partitioning. 



db.users 




Shard A I Shard B I Shard C I Shard D 




One could think about sharding such 
collection in different ways. Either _id, 
name or age would be great 
candidates. 



_id: 1, 

name: "mario", 



likes: ["ski", "soccer", "swimming", "judo"], 
age: 19 



_id: 2, 

name: "sonia", 

likes: ["basketball", "tennis", "dance"] 
age: 24 



Jd: 143565, 
name: "albert", 
likes: ["malakamb"], 
age: 57 



Chunk I 



Chunk 2 



Chunk 3 




Key Space for 



Chunk 4 



{x:minKay) { x: .75 ){ x: -74 ) { x: 24 > { x: 25 } 



{x: I75){x: 176} 



( x: maxKey } 



Sharding + RaplicaSets! 



In a realistic environment, we will nave a mongo router filtering and centralizing accesses 
to the database. This process is a lightweight one called mongos. 

mongos then will communicate to another process called config server, which is a light 
mongod process, without actual data, but with metadata about the entire server logical 
structure configuration. These servers send the requests to the appropriate mongod 
instance from the Shards/ReplicaSets. 



! App Server 



|"Ãpp Server 



Router 

(mongos) 



Router 

(mongos) 



2 or more Routers 

"V 



3 Config Servers 



Confie Server 

Config Server 
Config Server 



2 or more Shards 



Shard 1 


Shard 


(replica set) 


(replica sec) 



Security 



Possible strategies: 

1) Mantain a "trusted environment", where you can lock down at the network layer ali 
relevent tcp ports; 

2) Use MongoDB authentication: 

2.1) Using -auth for security client access, through user/password strategy; 

2.2) Using --keyFile for intra-cluster security, granting that ali servers that compose 
the grid are genuine and can be, therefore, trusted. 

3) On top of that, one might use SSL to add encrypting to the messages exchanged 
within the cluster. But in orderto do so, it is required to compile MongoDB especiffically 
for that, using the -ssl parameter. 



Geospatial indexes 



• 2D only 

• Additional attribute ('compound') 

Suppose you have a collection named "places" that looks like the following: 

{ 

_id: 

loc: [20.8, 43.1], 
type: 'coffee', 

} 

To optimize accesses, one could add an index to this collection by doing: db. places. ensurelndex( {"loc": 
"2d"} ) 

And when you went for a query such as: db.places.find( { loc: { $near: [20, 40], $maxDistance: 5 } } ) 

That would return ali results within that range of a maximum distance of 5 between those measuring units. 

There is also a geoNear command implemented. That allows queries such as: 

$within: { $center || $box || # $polygon } 

Mongo also supports a spherical: true parameter to treat coordinates as spherical, for points at the surface of 
the Earth, for example. 



MongoDB free online courses 



Cursos: https://education.monqodb.com/courses 




Cursos atualmente disponíveis: 

1 . MongoDB for Java Developers 

2. MongoDB for Node.js Developers 

3. MongoDB for Developers 

4. MongoDB for DBAs 



mongoDB 



Gratidão! 

Pedro Guimarães 
pedrodpq(a)lncc.br 



